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Received signal strength 


Clustering is a significant idea for extending the scalability and enhancing 
the energy in the mobile ad-hoc network (MANET). In addition, the 
clustering concept is used to diminishes the cost of communication. The re- 
clustering procedure makes expensive, and frequent re-clustering procedure 
makes extra routing overhead and extra energy utilization. To solve these 
issues, received signal strength indication (RSSI) based clustering and 
aggregating data (RCAD) using Q-learning in MANET is proposed. In this 
approach, we build the clusters by node RSSI. The fuzzy logic system (FLS) 
is used to select the cluster head (CH) by the node mobility and node 
utilization energy. Q-learning-based data-aggregation for improving mobile 
node routing efficiency in MANET. Here, we can find an optimum next-hop 
node utilizing their Q-values established on the rewards (RD). Since the RD 
tule is used to decide the best solution for the Q-learning technique. This RD 
is computed by present bandwidth (PB), present energy (PE), present packet 
delivery (PDD), and hop count (HC) parameter for selecting the data 
aggregator from sender to receiver. The experimental outcomes illustrate 
that the RCAD approach increases 155 CH round and raises 24% cluster 
lifetime in the MANET. 
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1. INTRODUCTION 


MANETs are self-organized networks without any fixed infrastructure. The topology changes are 
very frequent in MANETs due to nodes’ mobility. The topology maintenance creates an extra overhead, as 
the mobility information of a single node is shared with all nodes in the network. To address the topology 
maintenance overhead problem in MANETs, the researchers proposed different cluster-based algorithms to 
reduce the size of a routing table [1]. The clusters are formed to adjust the topology changes within the 
cluster locally. If a node wants to communicate with a node outside the cluster, it only communicates with its 
CH. The CH communicates with other CHs to transmit data toward the destination. To efficiently utilize the 
clustering mechanism, stable and balanced clusters are required. Some metrics, such as relative mobility 
(node speed and direction), node degree, residual energy, communication workload, and neighbor's behavior, 
are required to form good quality and optimized clusters [2]. 
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In MANET, every node acts as an autonomous, and it transmits the data packet efficiently. The 
sender sends the data to the destination via intermediate nodes. Conversely, while functional to a MANET, 
many issues happen generally because of the nonexistence of centralized management and the movement of 
the nodes [3]. MANETs applications are raised rapidly, and nowadays, MANETs are able to offer several 
services. MANET's major conducive components are the accessibility of radios that can adjust to the 
situation of the channel and communicate at multiple data rates. However, it creates additional loads 
[4].While the MANET size is huge and nodes are randomly moving, and the clustering attains scalability. On 
the other hand, the clustering method has its limitations because of the cluster structure as well as 
management [5]. The fundamental dispute to accomplish the scalability and the cost of clustering represents 
the efficiency of clustering [6]. The exchange of information is associated while local actions, for example, 
energy drain or node mobility. Several clustering approaches modification the clusters entirely as well as the 
CHs are re-elected [7]. In this approach, the genetic algorithm (GA) is used to discover the fitness function 
for receiving the optimized route. It offers an optimization procedure to choose the efficient routes which 
present the greatest fitness values based on the highest remaining energy and minimum data traffic. However, 
this approach increases the hop count [8]. 

The load-balanced clustering infrastructure (LBCI) approach is adjusted to enhance the capacity. 
Here, the integer linear programming finds out the feasible solution, and it offers data distribution timely. 
This approach measures the delay based on the evaluated value and the real past value concurrently to 
improve the inaccurate difficulty of the measured value. Then distributed data scheduling algorithm that 
employs the limited bandwidth. However, the correction of delays received may be involved through definite 
environmental components, and it increases the energy utilization [9]. Reinforcement learning and heuristic 
algorithm is used to choose the neighbors to transmit the packet to the destination. This approach is for 
forecasting the node behavior via reinforcement learning [10]. The Q-learning algorithm computes the action 
value. The bi-objective intelligent routing approach is used to minimize a long-run cost function that contains 
pathway energy cost as well as delay. The multi-agent reinforcement learning technique evaluates the 
optimal routing in the absence of information about the system's statistics [11]. Data gathering and data 
aggregation are key issues in the network. Data aggregation approaches maintain proficient dynamic updates, 
and aggregation methods permit updating the data structure resourcefully when managing the routing 
function [12]. This article is structured as follows: section 2 describes the received signal strength indication 
(RSSI) based clustering and aggregating data using Q-learning in MANET. Section 3 contains simulation 
results. Finally, section 4 present the conclusion. 

Ant colony optimization (ACO) based fuzzy logic (F-ANT) approach is used to discover an efficient 
route based on the node bandwidth, node congestion rate, and RSS. To find the most efficient this approach 
increases the packet delivery and minimizes the routing overhead. However, this approach creates additional 
routing overhead and lacking scalability [13]. An intelligent naive Bayesian probabilistic estimation is used 
for building a stable clustering. This scheme is used to enhance the routing through the awareness of the 
traffic. The CH is preferred from the path having the highest traffic to enhance the stability as well as lifespan 
[14]. The self-organization-based clustering approach is used to enhance network stability and scalability. 
This approach applies the Bio-inspired behavior of birds flocking for the cluster arrangement as well as 
management. It is used to minimize congestion as well as enhance the function. Also, it minimizes the 
additional energy utilization [15]. An adaptive geographic routing approach is used to improve transmission 
quality. However, using the greedy forward method to counter local maximizations raises the network 
delay [16]. 

Structure-free approach for replica insensitive data aggregation. This approach explains a token that 
executing a self-repelling random walk and aggregates data from nodesIt also minimizes the overhead of 
messages [17]. A peer-to-peer data dissemination approach is implemented to enhance the capacity of the 
network. It gives the data a timely manner among nodes [18]. This approach increases the packet delivery 
rate and minimizes the delay. However, it does not consider the length of the packet [19]. A hybrid 
distributed monitoring is a hierarchical-based technique for the dissemination of query and aggregating data. 
The gossip-based method is used to assist hierarchical topography to be whole the data aggregation and 
provide robustness and stability in MANET [20]. BlockTree is a monitoring approach, and it describes the 
idea of location-aware delivery and aggregates information. However, this approach offers precise results in 
the communication medium [21]. The census approach is used for aggregating the data. This approach 
evades a great overhead during node mobility [22]. Adaptive fuzzy multiple attribute decision routing 
determines the fuzzy score based on direction, distance, and location, and density. Here, transmit the data 
packets by the fuzzy score. It selects the stable route during the highest convergence rate and speed. It 
minimized the network delay. However, this approach can't be able to select the better node during the 
highest traffic load [23]. The volunteer nodes of ant colony optimization (VNACO) approach is used to 
minimize the delay. In MANET, the node moving out of transmission range while the volunteer node hears 
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the loss data packet next distributes the information to the transmitter node. In this approach, node 
connectivity, energy, time of transmission processing, as well as available bandwidth metrics are used to 
select the volunteer nodes. The ant colony optimization technique is used to provide the optimal route and 
minimize the routing overhead. However, this approach increases energy consumption [24]. This approach 
using an auto-encoder that assists the fuzzy clustering technique to defeat the deficiency that the function is 
simple to be affected through the amount of clusters [25]. An Efficient Self-Reconfiguration and Route 
Selection approach is used to alleviate failures of link. This approach repeatedlyobserves the energy 
efficiency and enhance the network function [26]. High-speed mobility methodis a severedistress for mobile 
nodes in the MANET [27].Improved uplink throughput and energy efficiency approach effort optimize the 
throughput energy in the clustering [28]. 


2. RSSI BASED CLUSTERING AND AGGREGATING DATA USING Q-LEARNING 

A MANET contains the number of mobile nodes, and these nodes are deployed in a specific region, 
and mobile nodes are moving freely in any way. The mobile nodes are allocated with distinctive IDs, and 
each mobile node can be aware of its neighboring nodes inside its communication range via HELLO and 
handling a table of neighbor details. In MANET, node mobility is a significant factor since the movement 
can't predict easily. Energy utilization is also a significant component because the node energy is dried 
completely; as a result, the node is dead. 


2.1. Clustering 

Clustering is a significant idea for extending the scalability and enhancing the energy in the 
MANET. In addition, the clustering concept is used to diminishes the cost of communication. In a clustering 
rule, initially, the clusters are assembled to perform their tasks on one round next do the re-clustering process. 
This re-clustering procedure makes expensive also network database completely renewed. Furthermore, the 
frequent re-clustering procedure makes extra routing overhead and extra energy utilization. To solve these 
issues, fuzzy logic system FLS is used to enhance the MANET lifetime. In this approach, the clusters contain 
several mobile nodes and CH. We build the clusters by node RSSI. Figure | illustrates the block diagram of 
clustering and aggregating data (RCAD). 


Sender transmit 
the data 


Data 
Aggregator 


Mobile nodes 


Figure 1. Block diagram of RCAD 


RSSI is a valuable parameter to recognize the distance between CH and its neighbour. The lesser 
value of RSSI represents that node is positioned distant from the CH, and the great value of RSSI illustrates 
that the distance between the sender and its neighbour is not extremely distant. The RSSI value is separated 
into 3 stages: the lower stage, middle stage, and the higher level. If the RSSI value is low, that represents the 
node cannot join this cluster, and it should maintain as a member of non-cluster. If the RSSI value is medium, 
that denotes the node feel right to a region of a cluster member. If the RSSI value is high, that represents a 
node that goes to a region chosen as Cluster members. Algorithm | shows the categorization of the MANET. 
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Algorithm 1: Categorize the MANET node 
For each node 

M= Minimum RSSI value; 

A= Average RSSI value; 

G=Greater RSSI value; 

If (Obtain an request message); 

Do (recognize RSSI value); 

if ( RSSI = M) then 

Not select a cluster members; 

return node status 

if ( RSSI = A) then 

Chances for select a cluster members; 
return node status 

if ( RSSI = G) then 


Confirms the nodes are selected as cluster members; 


return node status 


2.2. CH election 


The CH is elected by the factors of mobility of node and energy utilization of node. These factors 
are explained clearly. The minimum node mobility (NM) and minimum node utilization energy (NUE) are 
used to discover the round's maximum length. Figure 2 illustrates the FLS of the RCAD. From this figure, 
node mobility and node energy utilization values are given as the input, generating the output, i.e., length of 
the round (LR). The length of the round value computation is given. 


LR = Max(round[LRmax * FIS(NM, NUE)],1) 


Input 


(1) 


Fuzzification 


Mobility 
Energy 


Inference 


Rule Base 
System 


Defuzzificatio 


Figure 2. FLS of RCAD 


Each and every node is updated of LR max, and the maximum LR value node is chosen as a CH.The 
fuzzy set reports the NUE input variables are very low, low, middle, enough, and high. Additionally, the 
fuzzy set mobility input variables are near, enough, and far away. The fuzzy set input is NM and NUE, and 
LR is the output factor variables are very high, high, middle, small, and very small. 


Table 1. Fuzzy mapping rules 


Node mobility 


Node utilized energy 


Length of round 


near 
near 
near 
near 
enough 
enough 
enough 
enough 
far away 
far away 
far away 
far away 


Very low 
low 
middle 
high 
Very low 
low 
middle 
high 
Very low 
low 
enough 
high 


Very high 
high 
middle 
small 
Very high 
high 
middle 
small 
small 
small 
Very small 
Very small 


In this approach, heuristic data are functional for the dissemination of predefined fuzzy rules along 
with the below rules: a mobile node with the smallest node and minimum energy utilization node makes a 
higher LR. Table 1 illustrates the fuzzy mapping rules. The defuzzification process provides a single crisp 
number. The defuzzification of FIS has been attained through the center of area (CoA) method. 
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2.2. Data gathering 

After selecting CH, then execute the aggregation function based on the QL procedure. QL is a type 
of machine learning that handles the issue of node mobility and decides optimal performance to attain its 
goals by the learning of QoS parameters and its communications. The QL goal is to enhance the reward of an 
agent through actions in reply to a MANET. Here, every mobile node acts as an agent, and it can formulate 
definite results and discover a possible path for arriving at any association. The agent selection of route result 
is applied to reward or penalize the related routing decision of the routing approach. So that better decisions 
are chosen through rewards (RDs) and worst decisions are rejected through the penalty. 

QL can solve the static routing issues since it can capture the movement situation proficiently. The 
action at every mobile node is the chosen of the forwarder node for transmitting the information to the 
receiver node. QL is utilized to receive the optimal action-selection procedure applying a Q value (QV). The 
QV denotes the action of future reward. We monitored the routing approach's decisions, in which better 
results are chosen through RD, and worst decisions are rejected through the penalty. While we require to 
initiate a route and the receiver is not the sender node’s vicinity, the node will discover for its route table as 
well as QoS state tables initially. Here S represents the States, and A represents the action. With executing 
agent transmission from one to another to learn the environment. The decision to choose one of the given 
state's actions is to enhance the RDs of weight that comprise present and future RDs. 

In traditional routing approaches, managing and observing the network by the centralized controller 
acts as an agent as a result, which increases both the cost expensive and the routing overhead and can create 
complexity to identify the status of MANET. But the proposed approach does not have a present central agent; 
every mobile node acts as an agent, and they cooperatively share the information among neighbor nodes to 
make sure that every mobile node identifies the behavior of state transmission. The proposed approach element 
as follows {S, A, RD, P}. Here, S denotes the state, A denotes the Action, R denotes the reward, and P 
represents the possibility of communication. Let present state represents the sı, next state denotes the sm, and the 
action of present state represents the neighbor node list. We assume t denotes the waiting time for aggregated 
information is forward to choose the next neighbor node. Let N represents the mobile nodes count, and NL 
represents the list of neighbor nodes. The states and actions are defined as follows. 


te | 


Figure 3. State diagram of RCAD 


In QL, the value of the QV-table assists in discovering the greatest action for every state, in that the 
function of action value Q(s, a) gives the RDs of present and upcoming while action a is executed at state s. 
We believe that the agent chooses an action an in s, finds RD and goes into new state s'. Next, the QV, Q(s, 
a) is reorganized as follows: 


QV(s,a) — (1—A)QV(s, a) + A{RD + B.QV(s',a)} (2) 


here, A denotes the rate of learning and B describes the future RD discount factor.Figure 3, illustrates the 
states and actions of the RCAD approach. Assume the action denotes the aggregated information is forward 
to the present state to next state, the RD is specified to the present state s; the action of QV-table for state sis 
adapted. However, the present state sdoes not have the QV-table of the next state to update its Q-table. It 
denotes the efficiency of data aggregation and improves QoS energy at the following node selection; 
moreover, it is calculated at the next state. Hence, while the next node replies acknowledge the aggregated 
information to the sender, it also admits its maximum Q-values and calculated reward RD. Since the RD rule 
is used to decide a QL best solution. Here, we compute the RD by present bandwidth (PB), present energy 
(PE), present packet delivery (PDD), and hop count (HC) from sender to receiver. We compute the RD value 
is given: 
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here, the additional discount factor is applied to the nodes reward that is required to avoid back warding. 
When the next state node's present energy is moderately large, and the distance between the present and the 
next state nodes is small, it minimizes the energy utilization in the network. Discount factors range between 0 
to 1. 


3. SIMULATION ANALYSIS 

The simulations are utilized to compare RCAD with VNACO, and LBCI approaches. Our 
simulation tool is established on network simulator-2.35. We used the NS2 simulator to perform the 
simulation of the presented clustering technique. We also used random waypoint mobility to create the 
RCAD scenario and the traffic flow. Here, using 5 m/s to 25 m/s mobility. 

It illustrates the number of packets successfully delivered by the destination node. The ratio of 
packets delivery is greater in the RCAD approach equated to the LBCI and VNACOapproaches, as explained 
in Figure 4. Owing to Q-learning-based data aggregation is minimized, the highest losses of packet hence 
raise the packet delivery. 


Packet delivery Ratio (%) 


Nodes Mobility (m/s) 


Figure 4. Packet delivery ratio of VNACO, LBCI, and RCAD based on node mobility 


In the RCAD approach, the packet losses ratio is lower than the LBCI, and VNACO approaches are 
illustrated in Figure 5. LBCI and VNACO approaches have the greatest packet losses in the MANET since 
these approaches are increasing energy consumption. However, RCAD forms the clusters by node RSS and 
data forwarder by Q-learning method, thus minimizing the MANET packet losses. 


| ~S- LBCI 
e e às e RCAD 


Packet Loss Ratio 


o + } } } } j 
10 15 20 25 
Nodes Mobility (m/s) 


Figure 5. Packet loss ratio of VNACO, LBCI, and RCAD based on node mobility 
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Figure 6 demonstrates the comparison of CH lifetime for VNACO, LBCI, and RCAD in various 
mobile velocities. As node mobility raises, the CH lifetime of VNACO, LBCI, and RCAD will drop down. 
This represents that the necessary QoS is not the definite owing breaking of the route; otherwise, nodes move 
to other locations. Although, RCAD approach CH present the lifetime is 700 to 650 seconds. But, VNACO 
and LBCI approach lifetime is below 500 to 425 seconds. 


800 aa a aa 7 ara alr a a aa Hi a aaa aa aa ees rere art areas tenn, a 
700 idl ee id + 


600 $------ ==----==7 + 


CH Lifetime (S) 


Node Mobility (m/s) 


Figure 6. CH lifetime of VNACO, LBCI, and RCAD based on node mobility 


Figure 7 demonstrates that the Cluster Rounds of VNACO, LBCI, and RCAD are based on node 
count. From this figure, the count raises the cluster round also raised. From this figure, the RCAD raises the 
cluster rounds compared to the LBCI and VNACO approaches since the RCAD approach chooses the CH by 
the round length. This round length is computed by the node mobility with minimum utilized energy. As a 
result, increases the CH rounds in the MANET. 


Cluster Rounds 


Node Counts 


Figure 7. Cluster rounds of VNACO, LBCI, and RCAD based on node count 


Figure 8 indicates the remaining energy of VNACO, LBCI, and RCAD based on node mobility. 
The figure illustrates that the remaining energy of the VNACO approach is very low compared to the RCAD 
and LBCI approaches. The RCAD approach is to select the CH based on node utilized energy. As a result, 
minimizing the CH dead issues in the network. LBCI approach also increases the energy consumption than 
the RCAD approach. 
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Figure 8. Remaining energy of VNACO, LBCI, and RCAD based on node mobility 


4. CONCLUSION 

This paper presents the RSSI-based clustering and aggregating data using Q-learning. In this 
approach, network nodes are categorized as the RSSI to form the clusters. Node utilized energy and node 
mobility parameters to measure the length of the round. This length of the round is evaluated by the fuzzy 
fitness function. The Q-learning method is used for aggregating data from sender to receiver. Q-learning's 
goal is to improve the reward of an agent through actions in reply to a MANET. In this approach, the mobile 
nodes can find an optimum next-hop node utilizing their QV established on the RDs. This RD is computed 
by bandwidth, energy, packet delivery, and hop count to select the efficient data aggregator. Experimental 
results illustrate that the RCAD approach increases the network remaining energy and ratio of packet 
delivery. Furthermore, it enhances both the lifetime and the CH round in the MANET. 
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